This project has two parts:
This was my first project I used APIs to gather data from systems! When building my script I utilized:
These python packages are wrappers for the Riot Games API. I used these as I was unfamiliar with calling APIs and these wrappers made it very easy to gather my data!
Using the Riot API and the Cassiopeia python Library, I was able to collect around 17000 match entries from the KR, EUW, and NA servers. All of these matches were from the Challenger division.
The data collection process is one of the most important steps to any future analysis, therefore I used random sampling as well as only gathering one participant of 10 from every match to ensure independence for my entries.
The process went as follows: I gathered all of the regions’ challenger players, got each players’ most recent 20 games, and randomly selected one player (does not have to be the player in question) from each match to add to my data. I also ensured the match_id was never used twice between any players’ match histories.
Using logistic regression and decision trees to predict LoL game outcomes by role. Also determine what variables add most value to these models (ie factor that causes the most gain)
Below are the features we will consider in our model development!
Our first steps include previewing our data, changing any data types, and our dataset into the 5 datasets. each corresponding to the positions in League of Legends:
## <class 'pandas.core.frame.DataFrame'>
## Int64Index: 5760 entries, 0 to 5759
## Data columns (total 21 columns):
## # Column Non-Null Count Dtype
## --- ------ -------------- -----
## 0 d_spell 5760 non-null int64
## 1 f_spell 5760 non-null int64
## 2 champion 5760 non-null object
## 3 side 5760 non-null object
## 4 role 5760 non-null object
## 5 assists 5760 non-null int64
## 6 damage_objectives 5760 non-null int64
## 7 damage_building 5760 non-null int64
## 8 damage_turrets 5760 non-null int64
## 9 deaths 5760 non-null int64
## 10 gold_earned 5760 non-null int64
## 11 kda 5760 non-null float64
## 12 kills 5760 non-null int64
## 13 level 5760 non-null int64
## 14 time_cc 5760 non-null int64
## 15 damage_total 5760 non-null int64
## 16 damage_taken 5760 non-null int64
## 17 total_minions_killed 5760 non-null int64
## 18 turret_kills 5760 non-null int64
## 19 vision_score 5760 non-null int64
## 20 result 5760 non-null bool
## dtypes: bool(1), float64(1), int64(16), object(3)
## memory usage: 950.6+ KB
We have no null records in any of our features (how lucky), all we have to do is convert our spell variables into datatype object.
We will revisit this later, but these graphs act as a snapshot in time for what the best players in North America were playing!
Our first role we will take a deeper look at is support!
## (1149, 21)
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| assists | 0 | 1 | 12.90 | 7.02 | 0 | 8.00 | 12.0 | 17.00 | 41 | ▆▇▅▁▁ |
| damage_objectives | 0 | 1 | 814.34 | 991.86 | 0 | 89.00 | 502.0 | 1156.00 | 7600 | ▇▂▁▁▁ |
| damage_building | 0 | 1 | 1985.46 | 2163.87 | 0 | 473.00 | 1330.0 | 2748.00 | 16552 | ▇▂▁▁▁ |
| damage_turrets | 0 | 1 | 814.34 | 991.86 | 0 | 89.00 | 502.0 | 1156.00 | 7600 | ▇▂▁▁▁ |
| deaths | 0 | 1 | 5.44 | 2.96 | 0 | 3.00 | 5.0 | 7.00 | 16 | ▆▇▅▂▁ |
| gold_earned | 0 | 1 | 7775.28 | 2147.87 | 2980 | 6282.00 | 7655.0 | 9073.00 | 16792 | ▃▇▅▁▁ |
| kda | 0 | 1 | 4.21 | 4.30 | 0 | 1.56 | 2.8 | 5.17 | 35 | ▇▁▁▁▁ |
| kills | 0 | 1 | 2.40 | 2.37 | 0 | 1.00 | 2.0 | 3.00 | 17 | ▇▂▁▁▁ |
| level | 0 | 1 | 11.91 | 2.23 | 6 | 10.00 | 12.0 | 13.00 | 18 | ▁▃▇▃▁ |
| time_cc | 0 | 1 | 27.89 | 14.59 | 2 | 18.00 | 25.0 | 36.00 | 105 | ▇▇▂▁▁ |
| damage_total | 0 | 1 | 24880.94 | 15387.32 | 3926 | 15616.00 | 20955.0 | 28809.00 | 149403 | ▇▂▁▁▁ |
| damage_taken | 0 | 1 | 14690.56 | 6735.89 | 2008 | 9833.00 | 13599.0 | 18399.00 | 60004 | ▇▇▁▁▁ |
| total_minions_killed | 0 | 1 | 27.14 | 14.97 | 0 | 17.00 | 27.0 | 35.00 | 126 | ▇▇▁▁▁ |
| turret_kills | 0 | 1 | 0.27 | 0.61 | 0 | 0.00 | 0.0 | 0.00 | 5 | ▇▁▁▁▁ |
| vision_score | 0 | 1 | 57.11 | 25.12 | 6 | 39.00 | 55.0 | 73.00 | 161 | ▃▇▅▁▁ |
Very similar side distribution which is good, despite our study using random sampling we still should note side as unequal distributions could ignore the fact that side could play a role.
As we can see here our response variable is a bit imbalanced in favor of losses, thus using the F1 score could be beneficial as it is best for imbalanced datasets. However the difference is not very much (about 40 datapoints) thus using ROC could be our best method of determining model performance.